Enhancing VLAM Workflow Model with MapReduce Operations
نویسندگان
چکیده
Bibliography [1] A. Belloum, M. Inda, D. Vasunin, V. Korkhov, Z. Zhao, H. Rauwerda, T. Breit, M. Bubak, L. Hertzberger, Collaborative e-science experiments and scientific workflows, Internet Computing, IEEE 15 (2011) 39–47. [2] C. Olston, B. Reed, U. Srivastava, R. Kumar, A. Tomkins, Pig Latin: a not-so-foreign language for data processing, in: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD ’08, ACM, New York, NY, USA, 2008, 1099–1110. [3] R. Pike, S. Dorward, R. Griesemer, S. Quinlan, Interpreting the data: Parallel analysis with Sawzall, Scientific Programming 13 (2005) 277. [4] M. Baranowski, A. Belloum, M. Bubak, M. Malawski, Constructing workflows from script applications, Scientific Programming 20 (2012) 359–377 [5] M. Baranowski, A. Belloum, M. Bubak, MapReduce operations with WS-VLAM Workflow Management System, Procedia Computer Science 18 (2013) 2599-2602 Objective Provide an easy to use and efficient Domain Specific Language for defining MapReduce operations in workflows.
منابع مشابه
MapReduce Operations with WS-VLAM Workflow Management System
Workflow management systems are widely used to solve scientific problems as they enable orchestration of remote and local services such as database queries, job submission and running an application. To extend the role that workflow systems play in data-intensive science, we propose a solution that integrates WMS and MapReduce model. In this paper, we discuss possible solution of combining MapR...
متن کاملImproving Current Hadoop MapReduce Workflow and Performance
This study proposes an improvement andimplementation of enhanced Hadoop MapReduce workflow that develop the performance of the current Hadoop MapReduce. This architecture speeds up the process of manipulating BigData by enhancing different parameters in the processing jobs. BigData needs to be divided into many datasets or blocks and distributed to many nodes within the cluster. Thus, tasks can...
متن کاملBind: a Partitioned Global Workflow Parallel Programming Model
High Performance Computing is notorious for its long and expensive software development cycle. To address this challenge, we present Bind: a ”partitioned global workflow” parallel programming model for C++ applications that enables quick prototyping and agile development cycles for high performance computing software targeting heterogeneous distributed manycore architectures. We present applica...
متن کاملAdaptive Information Passing for Early State Pruning in MapReduce Data Processing Workflows
MapReduce data processing workflows often consist of multiple cycles where each cycle hosts the execution of some data processing operators e.g., join, defined in a program. A common situation is that many data items that are propagated along in a workflow, end up being “fruitless” i.e. they do not contribute to the final output. Given that the dominant costs associated with MapReduce processin...
متن کاملConstructing gazetteers from volunteered Big Geo-Data based on Hadoop
Traditional gazetteers are built and maintained by authoritative mapping agencies. In the age of Big Data, it is possible to construct gazetteers in a data-driven approach by mining rich volunteered geographic information (VGI) from the Web. In this research, we build a scalable distributed platform and a high-performance geoprocessing workflow based on the Hadoop ecosystem to harvest crowd-sou...
متن کامل